1
Identification of Rare Events in EHR Data
Session 217, February 14, 2018
Alexander Turchin, MD, MS, FACMI
Brigham and Women’s Hospital and Harvard Medical School
2
Alexander Turchin, MD, MS, FACMI
Has no real or apparent conflicts of interest to report.
Conflict of Interest
3
• Importance of EMR data
• Natural language processing
• Rare events: the NLP approach
• What happened next
• How to explain it
Agenda
4
• Explain importance of identification of rare clinical events in EMR
data
• Define different technologies that can be used to extract
information from narrative electronic documents
• Compare efficacy of natural language processing technologies for
identification of rare clinical events
• Discuss reasons for differences in performance between natural
language processing technologies
Learning Objectives
5
Importance of
EMR Data
6
EMR = DATA
7
EMR = DATA
DIAGNOSESDIAGNOSES
MEDICATIONSMEDICATIONS
LABSLABS
VITALSVITALS
8
EMR = DATA
DIAGNOSESDIAGNOSES
MEDICATIONSMEDICATIONS
LABSLABS
VITALSVITALS
9
What’s under the Hood?
10
What’s under the Hood?
11
Caveat Emptor
EMR data are like a box
of chocolates…
You never know what
you’re gonna get.
12
Structured vs. Narrative
13
Structured vs. Narrative
STRUCTURED NARRATIVE
9,819
13,993
5,627 (30.9% of the total)
Medication
intensifications
Turchin A et al (2009) JAMIA; 16:362-370
14
Structured vs. Narrative
STRUCTURED NARRATIVE
> 99%
< 1%
< 1%
Home blood
pressure
Kramer M et al (2010) BMC Health Services Research; 10:139
15
Structured vs. Narrative
o More details
o Includes context
o Describes clinical reasoning
o Non-codable information
16
Natural Language
Processing
17
Natural Language Processing
Mr. Smith comes today
with chief complaint of
back pain. Denies
history of trauma,
urinary retention or
weakness.
back pain
Chief complaint
trauma
weakness
Negated
Natural Language
Processing
18
Natural Language Processing
19
NLP: 1-2-3
1. Identify Examples: collect examples (typically through manual
record review) of how the concept being sought is documented
in EMR. Hundreds of examples usually necessary for a
comprehensive description.
2. Learn from Examples: based on the examples, create a
language model that can recognize the concept being sought.
This step can be manual (e.g. in rule-based systems) or
automated (in machine-learning-based systems).
3. Evaluate the Language Model: test the language model on a
new set of examples that were not used to create to determine
its accuracy. Several dozen examples typically necessary to
have sufficiently narrow confidence intervals.
20
Rare Events:
the NLP Approach
21
Are not documented persistently in multiple documents, but
nevertheless could impact patient care and outcomes
• Splenectomy: requires specific immunizations (e.g.
pneumococcal) to prevent fatal illness
• Anaphylaxis to penicillin: a life-threatening reaction to a common
medication
• Rejection of treatment recommendation by the patient: can
impact both future treatment decisions and long-term outcomes
Rare Events
22
• Can present a particular challenge for design of NLP tools
because it can be difficult to collect enough examples to make
the tools sufficiently accurate
• This problem is not unique to NLP
Rare Events
23
Step 1: Identify Examples
24
• Non-adherence to blood pressure medications
– Significantly elevated BP (≥ 150/100)
– No intensification of anti-hypertensive medications
• Blood pressures measured at home
– Notes with blood pressure ranges
(e.g. 120-130/70-80)
• Patients declining insulin therapy
– Elevated blood glucose (HbA1c > 7%)
– No insulin treatment started
Data Enrichment
25
Step 2: Create a Language Model
Manually designed
(rule-based) systems
Automatically developed
(machine-learning-based)
systems
26
Step 2: Create a Language Model
• Can incorporate background
knowledge not directly found in
the examples
• May not need as many
examples
• Fast
• Can model non-linear
relationships
27
Classification Methods
• Typically employ bag-of-words approach (i.e. do not analyze
spatial relationships between words in a sentence).
• Naïve Bayes
• Logistic Regression
• Support Vector Machines (SVMs)
Machine Learning NLP
28
NaĂŻve Bayes
29
Logistic Regression
30
Support Vector Machines
31
Sequence Labeling Methods
• These methods are “aware” of the sequence of words in a
sentence and use it to inform classification.
• Conditional Random Fields (CRFs)
• Recurrent Neural Networks (RNNs)
Machine Learning NLP
32
Conditional Random Fields
Patient refused insulin
Sentence
Labels
33
Neural Networks
34
Deep Learning
Input Layer
Output Layer
Hidden Layers
35
Recurrent Neural Network
Input Layer
Output Layer
Hidden Layers
36
Canary
• A GUI-based platform allowing users without computer science
background to create [rule-based] NLP tools to identify concepts
of interest in narrative electronic data
• Supports advanced NLP features:
a) Concept-value extraction (e.g. ejection fraction)
b) Identification of concepts across sentence boundaries
c) Parallel processing
d) Portability of language models between Canary
installations
e) Can analyze text in multiple languages
• Freely available at
37
Canary
Language
models are
created using
word classes
(semantic
groupings) and
phrase
structures (rules
defining how a
concept can be
documented in
the text).
38
Step 3: Evaluate
True
positives
identified
by NLP
Sensitivity
(Recall)
All true positives
39
Step 3: Evaluate
True
positives
identified
by NLP
Positive Predictive
Value (Precision)
All concepts identified by NLP
40
Step 3: Evaluate
41
Why F1?
Reality: Rare Event Model: Common Event
Sensitivity: 100%
PPV: 5%
Arithmetic mean: 52.5%
Harmonic mean: 9.1%
42
What Happened Next
43
• Training dataset: 50,046 documents (2,660,475 sentences).
• Evaluation dataset: 1,503 documents (86,487 sentences).
• Prevalence of insulin decline by patients: 0.02% in both sets
(at the sentence level)
Nitty-Gritty: Data
44
• All machine learning models were trained using both words and
lemmas-based methods.
• Naïve Bayes and Logistic Regression never reached F1 of 0.5 on
the training set and were not further evaluated.
• Regularization parameter was optimized in the SVM model using
cross-validation.
• We also used Synthetic Minority Oversampling Technique
(SMOTE) to compensate for the low prevalence of true positives
using SVMs
• Regularization parameter and decision threshold were optimized
for the CRF and RNN models using cross-validation.
Nitty-Gritty: Machine Learning
45
• Canary language model was designed by a clinician with no
formal computer science training who had access to the same
training set used by the machine learning models.
• The final model contained 148 word classes and 284 phrase
structures
• Using this language model, Canary processed text at 1 MB (c.
200 documents) per CPU core per minute
Nitty-Gritty: Canary
46
Accuracy
System Sensitivity PPV F1 score
SVM
0.714 0.526 0.606
CRF
0.563 0.474 0.514
RNN
0.706 0.632 0.667
Canary
0.955 1.000 0.977
47
How to Explain It
48
• Non-linear boundaries (SVM vs. Logistic Regression).
• Oversampling (SMOTE for SVM)
• Taking context into account (RNN)
Machine Learning – What Worked
49
Man vs. Machine
Natural food is better!
50
Man vs. Machine
Natural medicine is better!
51
Man vs. Machine
Natural intelligence is worse?
52
Man vs. Machine
• Walking on two feet
• Stitching a T-shirt
• Translating to another language
• Understanding and
communicating emotions
• Etc.
• Mathematical calculations
• Chess / Go
53
Man vs. Machine
• Background knowledge of English
• Background knowledge of subject
matter
• Insufficient number of examples
• Imbalance between positive and
negative examples
Ability to generalize
54
Man vs. Machine
• Canary could integrate information
far apart in the sentence
• Insufficient number of examples
• Imbalance between positive and
negative examples
Ability to take context into account
55
• One way to improve machine-learning NLP is to create large
publicly available repositories of marked-up text (corpora)
• More likely to be helpful for basics (e.g. named entities) and less
likely for complex concepts representing clinical workflow
How could we do better?
56
• 0.02% prevalence at the sentence level
• 0.9% prevalence at the document level
•
• 30% prevalence at the patient level
Rare in Text ≠ Rare in Life
57
• Rare events are an important category of EMR data that may
require special approach to identification
• At the current state of technology human-designed NLP tools can
achieve significantly higher accuracy than machine learning
methods, though they can take time to develop
• Several techniques can improve performance of machine
learning methods, but further improvements are needed
Conclusions
58
• Nicholas Alexander
• Lee-Shing Chang
• Wendong Ge
• Matt Goldberg
• Peter Goldberg
• Naoshi Hosomura
Thank you
• Victor J. Lei
• Shervin Malmasi
• Stephen Skentzos
• Alex Solomonoff
• Dmitriy Timerman
• Huabing Zhang
Funding: Sanofi
59
Don’t forget to complete online session evaluation!
Questions?